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Jacob  Burbea 
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ABSTRACT 


The  paper  is  concerned  with  the  geometrical  properties  that  are 
induced  by  the  local  information  contents  and  structures  of  the  para¬ 
meter  space  of  probability  distributions.  Of  particular  interest  in 
this  investigation  is  the  Rao  distance  which  is  the  geodesic  distance 
induced  by  the  differential  metric  associated  with  the  Fisher  informa¬ 
tion  matrix  of  the  parameter  space.  Moreover,  following  Efron,  Dawid  and 
Amari,  some  affine  connections  are  introduced  into  the  informative 
geometry  of  parameter  space  and  thereby  elucidating  the  role  of  the 
curvature  in  statistical  studies.  In  addition,  closed  form  expressions 
of  the  Rao  distances  for  certain  families  of  probability  distributions 
are  given  and  discussed . 


Accession  For 


csARC* 


(M^r' 


of  Probability  Spaces 


by  Jacob  Burbea 

Metrics  and  distances  (or  semi-distances)  between  pro¬ 
bability  distributions  play  an  important  role  in  problems 
of  statistical  inference  and  in  pratical  applications  to 
study  affinities  among  a  given  set  of  populations.  A  stat¬ 
istical  model  is  specified  by  a  family  of  probability  dis¬ 
tributions,  usually  described  by  a  set  of  continuous  para¬ 
meters  known  as  parameter  space.  The  latter  possesses  some 
geometrical  properties  which  are  induced  by  the  local  infor¬ 
mation  contents  and  structures  of  the  distributions.  Start¬ 
ing  from  Fisher's  pioneering  work  [17]  in  1925,  the  study 
of  these  geometrical  properties  has  received  much  attention 
in  the  statistical  literature.  In  1945,  Rao  [24]  introduced 
a  Riemannian  metric  in  terms  of  the  Fisher  information 
matrix  over  the  parameter  space  of  a  paremetric  family  of 
probability  distributions,  and  proposed  the  geodesic  dis¬ 
tance  induced  by  the  metric  as  a  measure  of  dissimilarity 
between  two  probability  distributions.  Since  then,  many 
statisticians  have  attempted  to  construct  a  geometrical 
theory  in  probability  spaces  and  it  was  only  after  thirty 
years  later  that  Efron  [3.3]  was  able  to  introduce  a  new 


affine  connection  into  the  geometry  of  parameter  spaces  and 
thereby  elucidating  the  important  role  of  the  curvature 


in  statistical  studies.  Significant  contributions  to 
Efron’s  work  were  made  by  Reeds  [28]  and  Dawid  [11],  The 
latter  has  even  suggested  a  geometrical  foundation  for 
Efron’s  approach  as  well  as  pointing  out  the  possibility  of 
introducing  other  affine  connections  into  the  geometry  of 
parameter  spaces  (see  also  Amari  [1,2]).  This  recent  study 
has  also  revived  the  interest  in  dissimilarity  measures  like 
the  Rao  distance  [25],  especially  in  the  closed  form  expres¬ 
sions  of  these  distances  for  certain  families  of  probability 
distributions.  Some  work  in  these  directions  was  done 
earlier  by  6e nuov  [9,  10],  Recently,  Atkinson  and  Mitchell 
[3],  independently  of  Cencov  [9,10],  computed  the  Rao  dis¬ 
tances  for  a  number  of  parametric  families  of  probability 
distributions.  A  unified  approach  to  the  construction  of 
distance  and  dissimilarity  measures  in  probability  spaces 
is  given  in  recent  papers  by  Burbea  and  Rao  [7,8],  and  Oiler 
and  Cuadras  [22]  (see  also  [6]). 

1.  Generalities . 

We  first  introduce  some  notation.  Let  p  be  a  o-finite 
additive  measure,  defined  on  a  o-algebra  of  the  subsets  of  a 
measurable  space  x*  Then,  M=M(xsp)  stands  for  the  space  of 
all  y-measurable  functions  on  L=L(x:p)  designates  the 
space  of  all  peM  so  that 


I  I  pi  lp=fxlp(x)  |dp(x)=/x|p|dti<». 

By  M  =M  (x:p)  we  denote  the  set  of  all  peM  such  that 

T  i 

p(x)eR  =(0,»)  for  y-almost  all  xex>  and  we  define  L  =L (x:y) 

*r  t  T 

as  L  =M  ' i  L.  We  let  P=P(x:p)  stand  for  the  set  of  all 

i  T 

peL+  with  |  | p|  |  =1.  Evidently,  P  is  a  convex  subset  of  L+, 
and  peL+  if  and  only  if  p/ | |  p|  |^eP. 

In  the  probability  context,  a  random  variable  X  takes 
values  in  the  sample  space  x  according  to  a  probability  dis¬ 
tribution  p  assumed  to  belong  to  P.  If  X  is  a  continuous 
random  variable,  p  will  be  the  Lebesgue  measure  on  the  Borel 
sets  of  a  euclidean  sample  space  x  and,  if  X  is  discrete,  u 
is  taken  as  a  counting  measure  on  the  sets  of  a  countable 
sample  space  x* 

Let  0=(0^,...,0  )  be  a  set  of  real  continuous  parameters 
belonging  to  a  parameter  space  0,  a  manifold  embedded  in  Rn 
and  let  F^= { p ( •  |0)eL+:0e0}  be  a  parametric  family  of  posi¬ 
tive  distributions  p=p( • | 0) ,  0e0,  with  some  regularity  pro¬ 
perties  not  mentioned  explicitly  to  avoid  lengthy  discussion 
(see,  however  [3,7,12]).  For  example,  it  is  implicitly  as¬ 
sumed  that 

3iP=3iP(* | 0) =3p( * | 0) / 30±  (p=p(*|0)eFQ  ,  i=l,...,n) 

is  in  M  for  every  0eO.  It  is  also  assumed  that  for  a  fixed 
Oe0,  the  n  functions  {B.p}1?  ,  are  linearly  independent  over 


X»  We  also  consider  a  parametric  family  of  probability 


distributions  Pg={p(* [8)eP:0e0}  which  may  be  viewed  as  a 
convex  subfamily  of  Fg. 

Let  f  be  a  continuous  and  positive  function  on  !R+  and 
define 

dsf(6)5fx  ^-Up]2dp  (0e0  ,  p=p(-|0)eF0), 


where  in  the  integrand,  the  dependence  on  xex  and  0eO  is 
supressed  and  where 

n 

dp=dp(.|0)=  l  (3.p)d0  . 

i=l  1  1 

Here  and  throughout  the  remaining  parts  of  this  entry  we 
shall  use  freely  the  convention  of  supressing  the  dependence 
on  xex  and  Ge0.  Thus,  with  this  convention. 


(f) 

It  follows  that  the  nxn  matrix  [g  '(0)]  is  positive-de- 

2 

finite  for  every  0eO  and  hence  ds^  gives  a  Riemannian  metric 
on  0.  Alternative  expressions  for  these  quantities  are 


available  through  the  language  of  expectations.  Thus,  for 
P=  p(.|8)  e  FQ, 

ds^(0)=E. [(fop) (dlogp)2] 


g) /  (0)=Efl £ (fop) (3 . logp) (3  logp) ] . 

ij  o  l  j 


In  the  theory  of  information  (see  [6])  the  quantity- 
logp(-|9),  for  p(*|e)ePQ,  is  known  there  as  the  amount  of 
"self-information"  associated  with  the  state  0eO.  The  self¬ 
information  for  the  nearby  state  0+60 e0  is  then 
—logp ( • | 0+60) .  To  the  first  order,  the  difference  between 
the  self- informations  associated  with  these  states  is  given 
by 

n 

dlogp=  l  (3  logp)d0 
i=l  X 

2 

and  hence  ds^.(0)  measures  the  weighted  average  of  the  square 

of  this  first  order  difference  with  the  weight  f [p ( * | 0) ] - 

2  (f) 

For  this  reason,  the  metric  ds-  and  the  matrix  [g. .  ]  are 
called  the  "f- information  metric"  and  the  "f-information 
matrix",  respectively. 

(f) 

As  is  well-known  from  differential  geometry,  g^ 

(i,j=l, . . . ,n)  is  a  convariant  symmetric  tensor  of  the  second 

2 

order  for  all  0eO,  and  hence  ds^  is  invariant  under  the 

admissible  transformations  of  the  pareraeters.  Let  0=0(t) , 

tj<t<t2»  be  a  curve  in  0  joining  the  points  0V‘"/,  0V  e0 

with  0^ ^=0(tj )  (j=l,2)  •  Since  ds^=(ds^)  iS  the  line 

2 

element  of  the  metric  dsf,  the  distance  between  these  points 


along  this  curve  is 


c2  d?f 


\Q  dTdtH/'2i  I  g'fce)^  >W2dt| 

C1  QC  C1  i,j=l  J 


1/2, 


where  a  dot  denotes  differentiation  with  respect  to  the 


curve-parameter  t.  The  geodesic  curve,  namely  the  curve 


joining  0^  and  0^  such  that  the  above  distance  is  the 


shortest  is  called  the  "f-information  geodesic  curve"  along 

:>n 
.a) 


0^  and  0^  while  the  resulting  distance  S,(0^,9^)  is 


called  the  ”f- information  geodesic  distance"  between  0 
(2) 

and  0  .  The  f-information  geodesic  curve  0=0 (t)  may  be 

determined  from  the  Euler-Lagrange  equations 


n 


(i-2>  (k=k . n) 

i— l  i,j  l 

and  from  the  boundary  conditions 


.(f)i 


0.(t.)=0p) 


(i— 1 > « • • »n  j  j  1,2). 


.(f) 


Here,  the  quantity  is  given  by 


M  r(f)_  lra  (f).a  At)  a  (f), 

(1'3)  Pijk-  2[3lgjk  +3jgki  ~3kgij  1 


and  is  known  as  the  "Christoffel  symbol  of  the  first  kind1 
2 

for  the  metric  ds^. 

By  the  very  definition  of  the  f-information  geodesic 

curve  0=0 (t),  its  tangent  vector  0=0 (t)  is  of  constant 

2 

length  with  respect  to  the  metric  dsf.  Thus, 


(1.4) 


{sf(0(t))>  =  \  gij6j[0j -const. 

1  j  j=l 

The  constant  may  be  chosen  to  be  of  value  1  when  the  curve- 
parameter  t  is  chosen  to  be  the  arc-length  parameter  s, 
°lsls0  with  soSSf(0(1\0(2)),  e(0)=9(1)  and  0(sq)=0(2).  It 
is  also  clear  that  the  f-information  geodesic  distance 
on  the  parameter  space  0  is  invariant  under  the  admissible 
transformations  of  the  parameters  as  well  as  of  the  random 
variables . 

2 

The  metric  ds^(0)  may  also  be  regarded  as  a  functional 

of  p(.|0)eFQ.  This  functional  is  convex  in  p ( *  j 0 ) e FQ  if 

and  only  if  the  function  F(x)=x/f(x)  is  concave  on  (R+.  In 

2 

particular,  if  f  is  also  a  C  -function  on  IR+  then  this 
holds  if  and  only  if  FF">2(F’)  on  iR+.  The  choice  of  f(x)  = 
xa  gives  the  "a-order  information  metric" 

(1.5)  ds2(0)=Ejpa_1(dlogp)2] 

ct  o 


with  the  corresponding  "a-order  information  matrix" 

(1.6)  g^(0)=Efl[p“  1(8  logp)(3  logp)] 

ij  o  i  J 


and  the  "a-order  information  geodesic  distance"  on  0.  It 


2 

follows  that  dsa(0)  is  convex  in  p(*|0)eFg  if  and  only  if 

2 


l£a<2.  We  drop  the  suffix  a  when  a=l.  Then,  ds  is  known 
as  the  "information  metric"  or  the  "Fisher  amount  of  infor¬ 


mation"  while  fg..l  is  the  well-known  "information  matrix" 
-  - 


tance"  (see  [3,7,26]).  We  also  note  that 

(°)  =/xPC1_l9ia  j  Pdu" J"xpCt3i3  j  lo8PdP  • 

Moreover,  for  a=}=0, 

(0)=a"23i3  j/xpC‘dM-a"1/xpa3i3 ^  logpdp . 

In  particular, 

gij(0)=3i3jJxpdP-/xp3i3jlogpdP 

and  thus 

(1.7)  gi^(0)=-/xp3i3^1ogpdp=-E0(3i3jlogp)  (p(*|0)eP0). 

2 

The  metric  ds^(0)  arises  as  the  second  order  differen¬ 
tial  of  certain  entropy  or  divergence  functionals  along  the 
direction  of  the  tangent  space  of  0  at  0e0.  See  [7,12]  for 
more  details  (see  also  [6]). 

2 

For  example,  let  F(*,0  be  a  C  -function  on  IR+x  TR+  and 
consider  the  "F-divergence" 

Dp(p,q)=J^F[p(x) ,q(x)]dy(x)  (p,qeM+) . 

We  shall  also  assume  that  F  satisfies  the  following  addi¬ 
tional  properties:  (i)  F(x,*)  is  strictly  convex  on  IR+  for 
every  xelR+;  (ii)  F(x,x)=0  for  every  xeR+; 


(iii)  3yF(x,y) |y_x=const.  for  every  xeR+.  For  p( * | 6  ) 

and  p(- |0^)  in  PQ  we  write 

PF(e(1),0(2))=DF[p(*  1 0(1))  ,p(*  1 0(2) ) ]  (0(1) ,0(2)e0) . 

Then,  for  p ( • 1 0 ) e and  0e0, 

Ppf0,0)=O  ,  dt?F(0,0)=Jx9yF(p,y)|y=p(dp)dp=O 


and 


d2PF(0,0)=ds2(0) 


where 


f(x)=x3yF(x,y) |y=x  (xeR+). 


It  follows  that  to  the  second  order  infinitesimal  displace¬ 
ments 


Pf(0,0+50)=  -|ds2(0) 


2.  Properties  of  the  Information  Metric. 

We  shall  describe  some  further  properties  of  the  f-in- 
2 

formation  metric  ds^.  However,  for  reasons  of  clarity  and 

economy  we  shall  restrict  ourselves  here  to  the  case  of  the 

2 

ordinary  information  metric  ds  (i.e.  when  f(x)=l  or  when 
a=l  in  (1.5))  on  the  parametric  space  0  of  probability  dis¬ 
tributions  p ( •  1 0)  in  ?q.  A  more  general  discussion  may  be 
found  in  [7,8],  We  shall  hereafter  also  assume  that  the  sum 


mation  is  taken  without  the  symbol  Z  when  the  indices  are 
repeated  twice  and  that  the  extent  of  the  summation  is  un-- 
derstood  as  running  from  1  to  n.  Thus,  with  this  conven-r 
fiou,  we  have,  by  virtue  of  (1.1),  (1.3;,  (1.6)  and  (1.7), 

ds2=g. .d8  d0 . 

13  i  3 

8ij=E0^3ilo8p^  (9jlogP)^=“Eet3i9jlogP^ 

and 


(2.1) 


rijk  2t3igjk+3j8ki“3kgij] 


The  information  geodesic  curves  0=0(s),  where  s  is  the 
arc-length  parameter,  are  determined,  in  view  of  (1.2),  by 


(2.2)  g..0.+r..I0.0.=O  (k=l, .  . .  ,n) . 

l  ijk  i  j 

Moreover,  from  (1.4)  we  also  have 


(2.3)  Sij^iV1* 

Thus,  for  two  points  a,be0,  or  for  p(*|a),  p(* |b)epQ,  the 
Rao  distance  S(a,b)  is  completely  determined  by  (2.2),  a 
system  of  n  second  order  (non-linear)  differenital  equations 
and  by  the  2n  boundary  conditions  0(0) =a  and  0(sQ)=b  with 
SQ=S(a,b).  This  computation  may  be  facilitated  with  the 
aid  of  the  normalization  (2.3). 


We  denote  by  I  the  Fisher  information  matrix  ( g . . ] ,  by 


the  elements  of  its  inverse  I  \  and,  as  usual,  the 
elements  of  the  unit  matrix  I  are  denoted  by  the  delta  of 
Kronecker  6^.  Note  that  I  ^=[g^3  is  also  positive-de¬ 
finite  and  that  I  is  associated  with  a  distribution 
p(*|0)t;P0  of  a  random  variable  X.  We  list  the  following 
properties  (see  Rao  [27,  p.  323-332]  for  more  details): 

1?  Let  and  be  the  information  matrices  due  to  two 
independent  random  variables  and  X2.  Then  1=1  ^t-l^ 
is  the  information  matrix  due  to  X=(X,  X0)  jointly. 

2?  Let  IT  be  the  information  matrix  due  to  a  function  T  of 
X.  Then  I-I^is  semi  positive-definite. 

3?  Let  p(*|e)eP0  with  the  corresponding  information  matrix 
I.  Assume  that  f=(f, ,...,f  )  is  a  vector  of  m  statis- 
tics  (random  variables)  and  define  j>(0)  = 

(g1(0) ,. . . ,gm(0) )  by  gi(0)=E0(fi)  (i«l,...,m),  i.e.  _f 
is  an  unbiased  estimator  of  j>(0).  Consider  the  mxm  and 
m*n  matrices  V=[V,^]  and  U=[V„]  given  by  V^  = 
E0[(fi-8i)(fj-8j)3  (i,j=l,...,m)  and  U_=EQ [f^logp] 
(i=l,...,m;  j=l,...,n).  Then: 

(i)  The  mxm  matrix  V-Ul  ^'U/  is  semi  positive-definite  for 
every  0eO.  The  matrix  is  zero  at  some  0eO  if  and  only 
if  ffUp. . .  ,fm)  is  of  the  form  f  i=Xik8klogp+E0(fi) 

(i— 1, ... ,m) : 

(ii)  Suppose,  in  addition,  that  9^ J^f^(x)p(x|0)dp(x)= 
J^fi(x)9^.p(x|0)dp(x)  (i=l,...,m;  j=l,...,n).  Then  U 


is  the  Jacobian-matrix  £ 3^. g^l  of  £=(g^, . . . jg^)  with 
respect  to  0=(0^,. . . ,0^) .  In  particular,  when  m*n  and 
£(9)=0,  i.e.  _f  is  an  unbiased  estimator  of  0,  then 
V— I  is  semi  positive-definite. 

The  last  property  constitutes  the  celebrated  "Cramer- 
Rao  lower  bound  theorem",  namely  that  for  any  unbiased  esti¬ 
mator  of  6,  its  covariance  matrix  dominates  the  inverse  of 
the  Fisher  information  matrix. 


3.  Information  Connections  and  Curvatures. 


The  information  metric  renders  the  parameter  space  0  as 
a  Riemannian  manifold  with  the  metric  tensor  associated 
with  the  distribution  p(*|0)ePg.  In  this  context,  the 


Christoffel  symbol  of  the  first  kind  in  (2.1)  is  called 


the  "first  information  connection".  As  is  well  known  from 


differential  geometry,  this  natural  affine  connection  in¬ 
duces  a  parallelism  on  0,  known  as  the  "Levi-Civita  paral¬ 


lelism",  which  is  compatible  with  the  metric  tensor  g^ ,  in 


the  sense  that  the  covariant  differentiation  of  the  latter 
vanishes  for  this  connection.  Using  the  summation  conven¬ 


tion,  one  introduces  the  "Christoffel  symbol  of  the  second 


kind"  by 


(3.1) 


rk  =r  emk 

ij  ijra8 


This  is  also  called  the  "second  information  connection". 


With  the  aid  of  this  connection,  the  equation  for  the  infor¬ 
mation  geodesic  curves  (2.2)  assumes  the  alternative  form 


(3.2)  ®k+rijSiV°  (k“l,....n). 

In  differential  geometry  one  also  considers  the 
"Riemann-Christoffel  tensor  of  the  second  kind" 


(3.3) 


r* =3 . r J.- -3. r* .+r“  r4 -r®  r* 

ijk  i  ik  k  ij  ik  mj  ij  mk 


and  the  "Riemann-Christoffel  tensor  of  the  first  kind" 


(3*A)  Rijk£"RjkAi* 

These  quantities  are  also  Known  as  the  "second  information 
curvature  tensor"  and  the  "first  information  curvature  ten¬ 
sor",  respectively.  In  this  r aspect,  it  is  worthwhile 
noticing  that 


Rijk£  Rjik£  ^ij  £k  \iij 


R, ..  _+R.,  .  .+R.  =0 

ijk£  ik£j  i£jk 


and  that  the  number  of  distinct  nonvanishing  components  of 

2  2 

the  tensor  is  n  (n  -1)/12.  The  latter  reduces  to  0 

when  n=l  and  to  1  when  n=2. 

The  "mean  Gaussian  curvature",  in  the  directions  of  x= 
(x1,...,xn)  and  y=(y1 , . . . ,yn)  of  IRn  is  given  by 


n 


(3.5)  K5ic(0:x,y) 


(0e0) 


RiiktViVt 


^gikgj&"gUgjk':tiyjxjy£ 


and  is  also  called  the  "information-curvature"  in  the  direc¬ 


tions  of  x  and  y.  This  curvature  is  identically  zero  if  0 
is  euclidean  and  is  constant  if  the  space  0  is  isotropic 
(i.e.  when  k  is  independent  of  the  directions  x  and  y)  ,  pro¬ 
vided  n>2. 


Besides  the  first  information  connection  r^k  there  are. 


of  course,  other  connections  leading  to  parallelisms  which 
differ  from  the  Levi-Civita  parallelism.  However,  in  the 
context  of  statistical  inference,  the  choice  of  such  connec¬ 
tions  should  reflect  the  structure  of  the  distributions  in 
some  meaningful  manner.  Following  an  idea  of  Dawid  [11], 


Amari  [1]  considers  the  one  parameter  family  of  affine  con¬ 


nections  gi-ven  by 


r.  ..sr.  ..-  §T.  ..  (aeR) 
ijk  ijk  2  ijk 


where  T„k  is  the  symmetric  tensor 


TijkHE0 1  Oilogp)  (3^ logp)  Oklogp)  ] . 


The  connection  T, ,,  is  called  the  "a- connect ion".  Thus,  in 
ijk  - 


this  context,  the  first  information  connection  is  the  O-con- 
nection.  An  alternative  expression  for  the  ct-connection  is 


I ; 1»8P>  <  V°gP>  1+  T2*. , k  ■ ' 


The  1-connection  was  introduced  first  by  Efron  [13]  and 
hence  is  also  called  the  "Efron-connection".  The  -1-connec 


tion,  on  the  other  hand,  is  called  the  '’Dawid-connection" , 
after  Dawid  [11]  who  was  first  to  suggest  its  introduction. 

In  order  to  elucidate  the  relevance  and  the  meaningful- 
ness  of  the  a-connections  in  statistical  problems,  we  con¬ 
sider  two  examples  suggested  by  Dawid  [11]  and  described  in 
Amari  [1] . 

Example  1.  We  consider  an  exponential  family  Pq  of  distri¬ 
butions  p ( *  1 0)  given  (using  the  summation  convention)  by 

(3.6)  p(x|e)=exp{T(x)+Ti(x)6i-iJ)(0)}  (xeX) 

with 

(3  7)  e’K0)_f  eTi^6ieT^dy(x)  (9e0), 

Jx 

and  specified  by  the  natural  free  parameters  0= 

2 

(0p . . .  ,0n)eO.  Here  ip  is  a  C  -funtion  on  0,  and  T  and 

T, , . . . ,T  are  measurable  functions  on  y*  Under  these  cir- 
1  n  A 

cumstances,  we  have 

3ilogp=Ti(x)-3iij)(0)  ,  3i3^.1ogp=-3i3ji|/. 

Therefore 

(3.8)  gij=3i3j^ 


(3.9) 


EA[(3.3.1ogp)(3,  logp)]=0 


ijk  2  ijk 


W] 


Since  r  (0)  is  identically  zero  for  a=l,  we  find  that  the 
ijk 

exponential  family  constitutes  an  uncurved  space  with  re¬ 
spect  to  the  Efron-connection.  For  this  reason,  the  Efron- 
connection  may  also  be  called  the  "exponential-connection". 
Example  2.  We  consdier  a  family  •  *  »qn+i^  ^is_ 

tributions  p(* |0)  given  by  a  mixture  of  n+1  prescribed 
linearly  independent  probability  distributions  on  x» 


r 


p(x|0)=qi(x)0i+qn+1(x)0n+1  (xex) 


where 


0  . =l-(0.+* • *+0  ) 

n+1  1  n 


and  0e0  with 


0={0=(01,...,0n)eR"  :  9n+1>0} , 


In  this  case,  we  have 


aiiogP=p"  (q^-q^)  >  3ia;.iogp=-oiiogp)Ojiogp), 


Therefore 


EgtO^.logpJO^ogp)]-!.^ 


and 
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r  =-^t 

1  ijk  2  ijk* 


It  follows,  since  F  (6)  is  identically  zero  for  a=-l,  that 

ij  k 

this  family  of  mixture  distributions  constitutes  an  uncurved 
space  with  respect  to  the  Dawid- connect ion.  For  this  rea¬ 
son,  the  Dawid-connection  is  also  called  the  "mixture -con¬ 
nection". 

a 

Once  the  a-connection  is  adopted,  the  other  related 

<*k  al  a  a 

quantities  \jk£  an^  K  are  determined  by  the 

same  rules,  (3.1)  and  (3.3)-(3.5),  for  determining  the  cor¬ 
responding  quantities  when  a=0.  For  example. 


pk  “  mk 
1  . .=1  . .  g 

ij  13  m6 


and,  corresponding  to  (3.2),  the  equation 


e.+r^.8.S.=0  (k=i, . . .  ,n) 

k  i]  i] 


gives  the  "straight-lines"  0=0 (t)  with  respect  to  the  a-con¬ 
nection.  When  a=0  these  "straight-lines"  are  also  the  in¬ 
formation  geodesic  curves.  This  is  not  necessarily  so  when 
a^O  for,  in  this  case,  the  a-connection  is  not  compatible 
with  the  metric  tensor  g... 

The  theory  of  a-connections  and  their  curvatures  seems 
to  be  particularly  applicable  in  elucidating  the  structures 
of  the  exponential  families  as  well  as  of  the  curved  expo  - 
nential  families  of  distributions.  An  exponential  family 


may  be  written  in  the  form  (3.6)-(3.7)  by  choosing  natural 


parameters  0=(0^, . . . ,0^)  which  are  uniquely  determined  with¬ 
in  affine  transformations.  In  this  case,  (T^,...,Tn)  con¬ 
stitutes  a  sufficient  statistic  for  the  family  and  has  a 
covariance  matrix  V  which  equals  to  I.  In  particular,  the 
corresponding  Cramer-Rao  lower  bound,  in  property  3°(i)  of 
the  previous  section,  is  always  attained.  Moreover,  the 
natural  parameter  space  0  is  convex,  and,  by  (‘3.8),  ip  is 
convex  on  0.  A  use  of  (3. 5)- (3. 10)  shows  that  the  a- 
Riemann-Christoffel  curvature  tensor  of  the  space  rsgivenby 


a 

R .  . .  “ 
ijk£ 


f  T  T 

1  jrk  im£ 


T  ]Emr 
r£  imkJg  * 


Initially,  this  formula  is  valid  only  for  the  natural  co¬ 
ordinate  system.  However,  since  the  formula  is  given  by 
means  of  a  tensorial  equation,  its  validity  does  not  depend 
on  a  particular  choice  of  the  coordinates.  It  follows  that 
for  any  exponential  family 


a 

Rijk£ 


=  (l-a 


)R 


ijk£’ 


and  hence  the  Efron  and  the  Dawid  connections  (i.e.  when  a=l 
and  a=-l)  render  the  space  0  as  "flat"  (or  with  an  "absolute 
parallism") . 

The  curved  exponential  families  can  be  embedded  in  the 
exponential  families  as  subspaces  (Efron  [13,14]).  Using 
this  observation,  one  shows  that  these  families  posses  vari- 
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ous  dualistic  structures:  The  Bamdorff-Nielsen  duality  [4] 
associated  with  the  Legendre  transformation,  the  a-(-a) 
duality  [1]  between  two  kinds  of  connections  and  the  a-(-a) 
duality  [1]  between  two  kinds  of  curvatures.  As  shown  by 
Amari  [1],  these  dualities  are  intimately  connected  and, 
moreover,  that  the  second-order  information  loss  is  express¬ 
ed  in  terms  of  the  curvatures  of  the  statistical  model  and 
the  estimator.  We  refer  to  Amari  [1],  Bamdorff-Nielsen 
[4],  Dawid  [11],  Efron  [13,14]  and  Reeds  [28]  for  a  more 
detailed  account  on  these  statistical  connections.  For 
the  general  study  of  connections  and  curvatures,  we  refer 
to  the  books  of  Eisenhart  [15,16],  Hicks  [18],  Laugwitz  [20] 
and  Schouten  [29]. 


4.  Informative  Geometry  of  Specific  Families  of  Distribu¬ 
tions  . 

An  informative  geometry  of  distributions  p(»|0)ePg  is 

the  geometry  associated  with  the  natural  affine  connection 

2 

Pijk  °P  t*ie  in^ortnat:*-on  metric  ds  .  We  shall  briefly  de¬ 
scribe  the  informative  geometries  of  certain  well-known 

families  of  distribution  P. .  This  description  includes  the 

0 

evaluations  of  the  curvature  and  the  Rao  distance  for  the 
family  P .  Note,  however,  that,  as  mentioned  in  the  pre- 
vious  section,  the  a-curvature  based  on  the  a-connection  of 


Amari  [1]  is  (1-a  )  times  the  present  information  curvature, 
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provided  Pq  is  an  exponential  family. 

4.1.  Univariate  Distributions . 

2  2 

Here  0  is  an  interval  and  ds  (0)=g(d0)  with 
g=g(0)=g11=-E0O2logp)  (p(  •  |0)ePQ). 

The  curvature  is  always  zero  and  the  connection  is 

g'(0)*  The  latter  can  be  made  to  vanish  identically  by 

*  2 

reparametrizing  0c0  to  seO  ,  where  (s'C©))  =g(0).  The  Rao 
distance  of  a,be0  is  given  by 

b 

S(a,b)=| J  /g(0)d0 | . 

a 

For  example,  for  a  one-parameter  exponential  family 

p(x|0)=exp{T(x)+t(x)<j>(0)-ip(0) } 

2  2  2  2 

we  find  that  g=  (4> 1 )  o  >0  where  a  =Eg[(T-aj)  ]  with  u)=E0(T), 
and,  moreover 

U)=lp  '  /<j> '  ,  0^=0)’  /<t>’. 

A  special  case  is  the  (generalized)  Weibull  distribution 

(4.1)  p(x|0)=T*  (x)<j>(0)exp{-T(x)i{>(0) )  (x>0,  0e0) 

with  respect  to  the  Lebesgue  measure  on  x=R+*  Here  T  is  a 
non-negative  differentiable  function  on  x  with  T(0)=0,  and 
is  monotonically  increasing  to  00 .  We  also  assume  that 


<f> (0) >0  for  every  0e0.  In  this  case  g={  (log<J>)  ' }  and  thus 

S (a,b)  =  j log (<J> (a)  / d» (b) )  |  (a,be0). 

We  now  list  some  other  nondiscrete  cases: 

1?  Gamma  Distribution.  Here 

,  [nS  1  r-1  -x0rtr 

p(x  1 0)=  e  0 

with  respect  to  the  Lebesgue  measure  on  x=R+*  The  parameter 
space  0  is  IR+  and  r>0  is  the  index  of  the  gamma  distribution. 
In  this  case 

S(a,b)=^r|log(a/b)  |  (a,beR+). 

2?  Weibull  Distribution. 

i  _  rfi 

p(x|©)=rxr  0e  X 

with  respect  to  the  Lebesgue  measure  on  x=R+*  Here  0=R+ 
and  r>0  is  the  index  of  the  Weibull  distribution.  This  is 
a  special  case  of  (4.1),  and 

S(a,b)= | log(a/b) |  (a,beR+). 

3?  Pareto  Distribution. 

p(x|0)=0rex"(9+1)  (x>r,  0eR+) 

with  respect  to  the  Lebesgue  measure  on  x=tr>“)»  r>0>  ^ 


before 


where  2  ={0,1, 2,. . .}.  Then 


S(a,b)=2 1 /a-t^b  |  (a,beR  ). 


8.  Negative  Binomial  Distribution. 


P(x|0)=  xlfTr) 9X(1~9) r  (xe2+»  0<e<1> 


with  index  r>0  and  0=(O,1).  The  Rao  distance  is 

I  (a,be0) 


S(a,l')=2/r  cosh 


v ✓(l-a)(l-b)  J 


Alternatively 


S(a,b)=2/r  log  ^+1^1.  (a,beO) 
/(l-a)(l-b) 


9?  Binomial  Distribution. 

p (x | 0 ) —  N)0X(1-0)N-X  (xe{0 ,1 . N}  ,  O<0<1) 

X 

with  0=(0,1)  and  N_>1  is  an  integer.  In  this  case 
S(a,b)=2/N  cos  /ab+/(l-a) ( 1— b ) } 
or,  equivalently, 

S(a,b)=2/N|sin  Va-sin  ^">/,b’|  (a,be0). 

The  distance  without  the  factor  2/N  is  also  called  the 
"Hellinger-Bhattacharyya  dis tance"  (see  [3,5,12])  for  the 


binomial  distribution. 


4.2.  Bivariate  Distributions. 


Here  0,  for  p(*|0)ePQ,  is  of  dimension  n=2.  In  this 
case,  the  first  information  curvature  tensor  has  only 

one  independent  component  r-j.212*  ^atter  coincides  with 

the  Gaussian  curvature  tc.  As  an  example,  we  describe  the 

2-dimenisonal  geometry  of  the  classical  normal  distribution 

2  2 
N(y,o  )  with  means  y  and  variances  a  (yeR,  oeR+) .  Other 

examples  are  described  in  4.3  below. 

For  the  normal  distribution 


p(x|0)=N(x|y,o2)=  - i-7— c.“(x  ^  !la  (xeR), 

(2tt)1/  0 

2 

we  have  0=RxR  and  0=(y,o  )e0.  The  information  metric  is 

ds2=2o~2  [  (— )  2+(da)  2] . 

n 

and  the  curvature  is  k=-2  Letting  y*=y// 2  and  introduc¬ 
ing  the  complex  variable  z=y*+iu  we  find  that  0  becomes  the 
upper-half  plane  { zeE :  lmz>0}  and  ds  =2a  dzdz,  effectively 
the  Poincare  metric.  The  geodesic  curves  are  the  "semi¬ 
circles" 


z=a+re 


i<j> 


r>0  ,  0<({)<tt  , 


where  a  is  a  real  constant.  This  family  includes  the  half¬ 
lines  Rez=const.,  ze0,  as  limiting  cases,  corresponding  to 

2  2 

r-*».  The  Rao  distance  S (1 ,2)  between  (y^,o^)  and 


where 


(  J-AJU  \2  1/2 

(u--u0)  +2(a1-a0) 


r".  If  y=(y1>...,yn)  is  another  vector  in  R  ,  then 

<9,y>=01y.+» . .+0  y  . 

nJn 

The  vector  (1,...,1)  of  Rn  is  denoted  by  1. 

At  the  present,  the  sample  space  x  is  a  subset  of 
with  a  counting  measure  and  the  parameter  space  is  of  the 
form 

0  ={0eR?:  1 0 1 <p }  (0<p<“>). 

A  typical  example  is  as  follows:  Let  F  be  analytic  with  the 
power  expansion 

CO 

(4.2)  F(t)=  l  b  t*  C-p<t<p), 

m=0  m 

such  that  t>m>0  for  every  meZ+.  Consider  the  probability 
distribution 

(4.3)  p(c|e)=  ifflb  |a|  Ftf^TT 

The  metric  tensor  is  then 

gij(0)=f(|0|)[0“16.j+h(|0|)]  (0e0p)  * 

where 

f (t)=(logF) ' (t)  ,  h(t)=(logf)'(t). 

The  first  information  curvature  tensor  in  0  is  given  by 


W9)=-M9iV'  « I0  D{h(  I6  !)■ <5ik6  jfhhv) 


1  ( I  e  I }  6i  (6  j  r6  jk)+h  ’  ( !  e  | )  e  j  (6ik-«ia)  > , 

while  the  information  curvature  is 


ic(0  :x,y)= - - - «Cf(|e|)£((|e|) 

4[f(|el)r 


+H(|e|)fh(|0|)+ 


_ A(x,y  ,y) _ ,-1-, 

<y  ,0>A(x,y  ,I)+<x,0:>A(y  ,x.I) j 


where 


H(t)=f (t)f"(t)-2[f ’ (t) ]2 


and 

A(x,y,z)=<x,x><y ,z>-<x,y><x,z>  (x,y,zeRn) . 

This  curvature  is  constant  if  and  only  if  it  is  isotoropic, 

i.e.  if  and  only  if  H(t)=0  for  -p<t<p.  The  latter  is  equi- 

r*t  — r 

valent  to  either  F(t)=ae  or  F(t)=a(b-t)  where  a,  b  and 
r  are  positive  constants  with  p_<b<«°.  This  gives,  effective 
ly,  either  the  independent  Poisson  distribution  with  k=Q  or 
the  negative  multinomial  distribution  with  K=-(4r)  ^  (see  1 
and  2°  below) . 

To  find  the  geodesic  curves  for  the  distribution  in 
(4.3),  we  introduce  the  additional  functions. 


L(t)=‘  I (t j+tf' (tyt ’ (t)h(t) ] 


Negative  Multinomial  Distributions 


p(a| 9)=  ^l;lg)9tt(l-l9|)r  (aeZ*  , 
with  index  r>0.  This  is  a  special  case  of  (4.2).  with  F(t)= 

“IT 

(l-t)  .  The  metric  tensor  is 


*ij-  PTeT5  <eeV  ’ 


and 


Rijk£ 


4(1- | 9 |)  9.0. 
+0j(6ik-6u)]} 


-[0,(6,  -6,i  ) 


2„  .  "ik°jl^i£ujkT3^lVuj* 


with 


It  follows  that  0^  with  the  above  metric  is  locally  iso¬ 
metric  to  the  "Poincare'’  hyperbloic  space".  In  particular, 
for  any  two  points  a,beG^  there  exists  a  unique  geodesic 
curve  in  0^,  with  respect  to  the  metric,  connecting  the 
points  (see,  for  examp’le,  Hicks  [18]). 

The  geodesic  curves  9=0 (s)  are  given  by  the  "hyperbolas 

9^={A^tanh—  (s+B^^+B^J^  (k=l, . . .  ,n) 

2/r 

where  A.,  B. (j=l, . . . ,n)  and  B  .  are  constants  satisfying 


This  family  includes  the  lines  0^*B^(k=l, . . . ,n)  as  limiting 
cases.  The  Rao  distance  is 

'  l~  l  ’’ 

S(a,b)=2/r  cosh  - I  (a,be0n). 

/(I- |a| ) (1— |b | )  1 

An  alternative  expression  may  be  obtained  by  using  the 
identity  cosh  ^x=log[x+/x-l] ,  x>_l.  The  expressions  agree 
with  those  in  4.1(8°)  when  n=l  (see  also  Oiler  and  Cuadras 
[22]). 


3°.  Multinomial  Distributions. 

p(<i|6)*-(n-M)!  |r(1-|e|)S"l°l  (e,£2?  •  |a|<li  ;  "‘V 

with  an  integer  N>1  and  sample  space  x={cteZ^j  1  oc | _< M } .  The 
metric  tensor  is 


iq«T)  <6E0i)  • 


and 


Rijkt=  i^tVV'sjk)+V5ik"5i*)11 


with 


This  gives,  effectively,  the  spherical  geometry.  In  fact, 


upon  putting  0n+1=l-|0|  and  introducing  the  new  variables 
1/2 

5^=6^  (i=l,. . .,n+l) ,  we  find  that  the  metric  becomes 

9  n+l  o 

ds  (y)=4N  £  (dy.) 

i=l  1 

and  0^  is  mapped  onto  the  positive  portion  of  the  (n+l)- 

2  2 

dimensional  unit  sphere  Y={y=(y^, . . . ,yn+^)eR+  :y^+* •  *+yn+^ 

=1}.  It  follows  from  the  spherical  representation  that  the 
geodesic  curves  0=8 (s)  are  the  "great  circles" 

A 

0,  (s)=(A,  cos— +B,  sin-^— )  (k=l, . . .  ,n+l) 

k  ^  2^  k  2^N 

with 


0  ..  =l-(8,+-'*+0  ) 
n+l  1  n 


and  with  constants  (A^,. . . ,An+^) ,  (B^,. . . >Bn+1)  satisfying 


^Ak=^Bk=1’  £  W0' 

k=l  K  k=l  K  k=l 


The  Rao  distance  is 


S(a,b)=2*^j  cos  y  /a,  b,  )  (a, be©.) 

k=l 


n+l 


with 


an+rHal  .  b^-l-lbl, 


n+l 


effectively  the  "Hellinger-Bhattacharyya  distance"  [5,6] 
(see  also  [3,24]).  This  agrees  with  4.1(9°)  when  n=l. 


The  nondiscrete  cases  that  we  shall  describe  here  are 


those  associated  with  the  normal  distribution 


N(x:y,E)= 


(2ir) 


n/2 


2| 


:j/2exp{-  2  1(x-y)} 


(xeRn) , 


with  mean  (column)  vector  p  and  a  variance- covariance  matrix 
E.  We  shall  use  the  standard  matrix  notation:  M(n,R)  is  the 
space  of  all  n*n  real  matrices,  S(n,R)={AeM(n,fR)  :  A/r*A}  the 
subspace  of  symmetric  matrices,  GL(n,R)  the  group  of  all  non 
singular  matrices  in  li(n,lR),  P(n,R)  the  subset  of  all  posi¬ 
tive-definite  symmetric  matrices  in  GL(n,R).  The  inner  pro¬ 
duct  and  norm  on  M(n,R)  are  given  by 


<A,B>=tr(AB/)  ,  | | A] |={<A,A>}1/2 


(A,BeM(n,(R)  )  , 


and 

[A,B]=AB-BA  (A,BeM(n,iR)) 

stands  for  the  commutator  of  A  and  B. 

For  the  normal  distribution  p(* | 0) =N( • |p,E) ,  G  is 
RnxP(n,(R)  and  8=(p,E)e0.  The  information  metric  is  then 

(4.4)  ds2=(dji/ E  ^(dp)+ -^tr{(E  ^dE)2}  ((u,E)eG). 

We  note  that 

(4.5)  tr{(E_1dE)2}=| |E-1/2dEE“1/2| |2  (EeP(n,!R)). 

The  geodesic  curves  are  determined  by 
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1  E  ^y=c  (ceRn) , 

i 

j 

(4.6)  <  (E-1E) *+cc/E=0, 

• 

!  c"Ec+  4tr{(E-1E)2}=l, 

l  2 

where  c  is  a  constant  vector  in  IRn.  We  also  note  that  for 

any  (a, A)  in  Rn*GL(n,R)  ,  the  napping  (yjEj-KA7 y+a,  A  EA) 

establishes  a  homeomorphism  of  0  onto  0  which  is  also  an 

2 

isometry  with  respect  to  the  information  metric  ds  .  Con¬ 
sequently,  the  Rao  distance  between  (y^,E^)  and  (y2,E2)  in  0 
satisfies 

S(p1,E1:y2,E2)=S(A/p1+a,A/ E-jArA'y^a.A^A) 

for  any  (a,A)eRnxGL(n,R) .  In  particular,  the  above  Rao  dis¬ 
tance  S(l,2)  admits  the  form 

S(l,2)=S(0,I:E"1/2(y2-y1) ,  E~1/2E2E~1/2) . 

Explicit  expressions  for  the  geodesic  curves  and  the  Rao 
distance  in  this  general  setting  are  not  available.  We 
therefore  only  describe  some  special  cases. 

4?  Fixed-Variance-Covariance  Normal  Distributions. 

In  this  case  we  consider  the  family  of  normal  distribu¬ 
tions  N ( *  | y >Eq)  with  a  fixed  variance-covariance  matrix  Eq. 
In  this  case  0  is  iRn  and  the  information  metric  is 

ds2=(dy/ E^Cdy)  (yeRn) , 
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which  is  essentially  the>  euclidean  metric  on  Rn,  since 
EQeP(n,R)  is  constant.  The  Rao  distance  is  therefore 

S(y1,y2)={  (p1-y2)/  (yj^eR11)  • 

This  is  the  familiar  "Mahalanobis  distance"  [18]  and  it 
agrees  with  the  distance  in  4.1(6°)  when  n=l.  Note,  how¬ 
ever,  that,  as  is  mentioned  also  in  4.2,  the  present  dis¬ 
tance  cannot  be  regarded  as  the  restriction  of  the  Rao 

distance  for  the  entire  manifold  RnxP(n,!R)  to  [Rnx{E  }.  This 

o 

is  because  the  curve  (y,EQ),  as  (4.6)  shows,  is  not  a  geo¬ 
desic  curve  of  the  metric  in  (4.4). 

5?  Fixed-Mean  Vector  Normal  Distributions,. 

Here  we  consider  the  family  of  normal  distributions 

N(’  y  ,E)  with  a  fixed  mean  vector  y  .  In  this  case  0  is 
o  o 

P(n,R)  and  the  information  metric  is 

(4.7)  ds2=  •|tr{(E_1dE)2}  (EeP(n,R)). 

Moreover,  the  geodesic  curves  E=E(s)  of  this  metric  may  be 
determined  from  (4.6)  with  c=0.  This  gives  (E  E)’=0  and 
the  normalization  tr{(E  E)  }=2.  The  solution  E(s)  must  be¬ 
long  to  P(n,R).  Consequently  the  most  general  geodesic 
curve  is  of  the  form 

(4.8)  E(s)=A/esBA 

where  A  and  B  are  constant  matrices  such  that 


(4.9) 


AeGL(n,!R)  ,  BeS(n,R)  ,  j |b| |~=tr(B  )=2. 


The  group  of  automorphisms  G  of  P(n,R)  onto  itself  is. 

generated  by  Z'-'-Z  ^  and  Z'~>A/ZA  where  AeGL(n,R).  This  group 

is  "transitive",  i.e.  for  any  Z^^ePfajR)  there  exists  an 

feG  such  that  f(Z^)=Z2  (just  choose  A=Z^^^Z2^eGL(n,R)) . 

Moreover,  for  a  given  Z^eP(n,R),  the  automorphism  feG  given 

by  f(Z)=Z^Z  ^Z^,  ZeP(n,R) ,  satisfies  f=f  ^  and  f(Z^)=Z^. 

Consequently,  the  parameter  space  0=P(n,R)  is  a  "symmetric 

space".  It  is  also  easily  verified  that  the  group  G  coin- 

2 

cides  with  the  group  of  isometries  of  the  metric  ds  on 
P(n,R).  The  group  G  forms  a  subgroup  of  the  "Siegel 
sympletic  group"  [30]  which  acts  on  the  "Siegel  upper-half 
space"  S(n,R)+iP(n,R) .  This  gives  an  alternative  descrip¬ 
tion  for  the  geodesic  curves  which  is  equivalent  to  that  of 
(4.8)-(4.9),  namely  that  the  most  general  geodesic  curve 
Z=Z(s)  is  of  the  form 

v.  s  vs 

f (Z(s))=diag[e  ,...,en  ]  (feG) 

where  f  is  an  arbitrary  automorphism  of  P(n,R)  and  v^, . . .  ,v 

2  2 

are  arbitrary  non-negative  numbers  with  v.j+***+v  -2. 

The  Rao  distance  S(l,2)  between  Z^  and  Z2  of  P(n,(R)  is 
easily  determined  from  the  above  geodesic  equations.  The 
geodesic  curve  Z=Z(s)  along  these  two  points,  with  Z(0)=Z^, 
I(s  )=Z„  and  S  =S(1,2),  satisfies  (4.8)-(4.9)  in  the  inter- 


r 


m 


val  0<s<s  with 
- o 


s  B  . 

k' A=I1  ,  e  0  =(A-i)  Z2A-i. 


Consequently, 


(4.10)  S(l,2)=  ~\  | logE“1/2E2E“1/2 1  |  (E1,E2eP(n,(R)). 


We  note  that 


(4.11)  | |logE^1/2Z2E"1/2| |2=tr{log2E"1/2E2E"1/2}=  \  log2Ak, 


where 


(4.12)  Xk=Xk(Ei1/2s2i:i1/2)  (k=l, . . .  ,n) 


are  the  positive  eigenvalues  of  the  positive-definite  matrix 
-1/2  -1/2 

Ei  E22l  (n°te  also  the  symmetry  between  E^  and  E2  in 

(4.11)).  Equivalently,  are  the  singular- values  of 

2221^’  or  ^1^22’  an<*  ckey  are  determined  uniquely  as  the 
solutions  A=Ak(k=l, . . . ,n)  of  the  determinantal  equation 


|  xr2-z:1|=o. 

Other  alternative  expressions  for  the  Rao  distance  S(l,2) 
in  (4.10)  are  also  available.  For  this  purpose  we  introduce 
the  symmetric  matrix 


T12=(ErE2)(El+2:2)" 


and  define  R=R(E^,E2)  by 


Then  R  is  symmetric  and  semi  positive-definite;  it  is  posi¬ 


tive-definite  if  E^4:^2  anc*  *s  zero  otherwise.  Consequently, 
the  eigenvalues  r^=r^(R)  of  R  are  related  to  the  eigenvalues 
in  (4.12)  by 

2 

V^l+T^  >  °irk<:L  (k=l, . . .  ,n) . 
k 

1/2  1/2 

In  particular,  the  matrices  I+R  and  I-R  are  members  of 
P(n,IR).  Moreover,  since 


o  i  i—l /  o  o  00  „ k  . 

log  - -jj  =4[tanh  r  7  ]  =4r(  \  oTTr)  (0£r<l) 

l-r1/Z  k=0  Zk+1 

and 

.  n  . 

tr(RJ)=  l  rj  (j=0,l,...), 

k=l  K 

1/2  1/2 

we  have,  noting  that  I+R  and  I-R  commute, 


,  2  I+R1/2  .  r.  ,-l„l/2n2  ”  Rk  v2 

lo8  - T~p)  ='+Ctanh  R  1  =<!*R(  L  ajTrr)  » 

I-R  7  k=0 


and  therefore 


,,  „2,,  ns  1^  f1  2  I+R1/Z,  1  ?  ,  2  1+rk 

(4.13)  S  (1,2)=  -^trllog  - 7  4  log 


1/2 


I-R' 


"k=l 


1-r, 


1/2 


The  components  of  the  first  curvature  tensor  at  any 
EeP(n,R)  are  zero  and  thus  the  space  P(n,R)  is  essentially 
euclidean.  The  Rao  distance  in  (4.J.0)  or  in  (4.13)  reduces 

to  that  in  4.1(5°)  when  n=l  (see  also  [3]). 

6°.  Independent  Normal  Distributions. 

We  consider  a  family  of  normal  distributions 


N(-  J|i,Z:Zo)  with  varying  mean  vectors  yeR  and  variance- 


covariance  matrices  EeP(n,R)  that  commute  with  a  fixed  vari¬ 


ance-covariance  matrix  EQeP(n,R),  Eq^I.  The  parameter 


space  0  in  this  case  is  Rn*P(n,R:Eo) ,  where 


P(n,R:E  )={EeP(n,R) :EE  =E  E}. 
o  o  o 


The  last  set  contains  the  matrices  I  and  E  ,  and  it  contains 

o’ 


all  the  powers  Em,  m=±l,±2,...,  provided  EeP(n,R:E  ).  In 


addition,  if  E^  and  E^  are  any  members  of  P(n,R:EQ)  then  so 


is 


The  fixed  matrix  Eq  admits  the  decomposition 


E  =U'A2U 

O  0  0  0 


where  Uq  is  an  orthogonal  matrix, 


U'U  =U  Uy=I, 
0  0  0  o 


with  positive  elements  in  the  diagonal,  and  Aq  is  a  diagonal 


matrix 


A0°diaglool,...,oon]  (aQk>0  ;k=1»--*»n)» 


with  positive  elements.  It  follows  from  the  well-known  re¬ 
sult  on  the  simultaneous  diagonalization  of  commuting  sym¬ 
metric  matrices  (see,  for  example,  [27,  p.  41])  that 

P(n,R:E  )=P(n,R:U  )  where 
o  o 

2 

?(n,R:U  )={EeP(n,R) :E=UqA  Uq) 


A=diag[o^,. . . ,0^]  (©^>0  »  k=l,...,n). 

2 

This  shows  that  P(n,R:A  )=P(n,R:I)  is  the  set  of  all  diago- 

o 

nal  matrices  in  P(n,R)  and  that  it  is  isomorphic  to 
P(n,R:£Q).  In  particular,  P(n,R:EQ)  is  an  n-dimensional 
submanifold  of  the  full  n(n+l) /2-dimensional  manifold 
P  (n  ,R) . 

The  mapping  x^Uqx  constitutes  an  isometry  of  the  sample 
space  x=Rn  onto  itself  and  preserves  the  Lebesgue  measure 
on  Rn.  Therefore,  the  given  family  of  distributions 
N(*|jj,£:£o)  is  identical  with 

(4.14)  N(- |u,A2:Uo)=N(* Iv^o2) ’ * *N(' lvn»CT^ 
where 

(4.15)  v=(v1,...,vn)/=Uop, 

A=diag[a. , . . . ,a  ]  (a,  >0 :k=l, . . . ,n) . 

X  IT  K 

The  given  distributions  are  therefore  products  of  n  indepen¬ 
dent  univariate  normal  distributions,  and  hence,  by  virtue 
of  property  1°  of  section  2,  the  analysis  can  be  reduced  to 
that  found  in  4.2.  The  parameter  space  0  is  now  (RxR  )n  and 

T 

6=[(Vp02) ,. . . ,  (v^.o2)  J'  e(tR*R+)n,  and  the  information  metric 

"‘'I 
— - — 1 

„  n  „  dv,  „  „ 

ds  =2 1  o~z[(— r+(do,n. 

1 _ T  ./o’  **  .*  .* 


is 
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Letting  v^=v^/ /2  and  introducing  the  complex  variables 

■k 

z^=v^+ia^  (k*l, . . . ,n)  we  find  that  0  becomes  the  poly-upper- 
half  plane  lin={z=(z^,. . . ,zn)e£n:imzjt>0,  k*l,...,n)  and 


n 


dsZ=2  l  o^2dz^dz^  (?.eUn), 
k=l 

effectively  the  "Poincar/  metric"  of  Un. 


The  above  metric  is  "hermitian"  (see  [19]),  i.e.  it  is  of 
the  form 

ds2=g.  vdz,  dz . , 
kj  k  j  * 

where  the  summation  convention  has  been  used.  Here  [g^j] 

is  an  nxn  hermitian  (i.e.  g.-r^g,^  ;  k,j=l,...,n)  matrix, 

KJ  j  K 

defined  on  a  complex  manifold  M  of  a  complex  dimension  n. 

For  a  local  coordinate  system  of  M  with  z^x^+iy^ 

(k=l,...,n),  we  introduce  the  complex-derivatives 


3k=(8x  -i3v  )/2  *  \={K+iK  )/2  (k=l,.  ..,n) 

K  xk  yk  K  *k  yk 

and  the  components  of  the  "Ricci  curvature"  tensor 


\r“23kV°gG’ 


where  G  is  the  determinant  of  [g,-r].  The  components  of  the 

KJ 

Riemann  curvature  tensor  are  now  given  by 


rra„ 


Rijkfc~-ak3Jl8ij+8  3k8im3£,8rj  ’ 


while  the  mean  Gaussian  curvature  is  replaced  by  the  "holo- 
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morphic  sectional  curvature” 

2R.t,  rV.V.V,  v„ 

k(z:v)= — ^ 2 —  (zeM) , 

t8ijVj] 

at  zeM  in  the  direction  of  v=(v^, . . .  ,vn)etn.  Here  [g™]  is 

the  matrix- inverse  of  [g  -]. 

rm 

For  the  metric  under  consideration  we  have 

_o  ~~r  kI_  1  2. 
gkj_2ak  6kj  *  8  2°k6kj 


and 


Therefore 


Moreover, 


G=2n{a1‘**crn]-2. 


\j~  28kj  °k  5kj 


_  _  _  -4. 
ijk£  CTk  ijk£ 


where  S  is  the  tensor  whose  components  are  of  value  1  if 

i=j=k=£  and  0  otherwise.  It  follows  that  the  information 

2  2  / 

holomorphic  sectional  curvature  at  z=I(v. ,on) , . ,(v  a  )] 

xx  n  j  n 

in  the  direction  of  v=(v1s...,v  Jet11  is 

1  n 


n 


I  °-4M4 

k(z:v)=-  £  k=1 


2  n 


[  I  \|vk|2]2 

k=l  k  * 


This  curvature  is  independent  of  the  mean  vector  v=Uqp ,  and 
it  varies  between  -1/2  and  -l/2n,  a  result  consistent  with 
4.2. 

The  geodesic  curves  are  the  product  of  the  "semi-circles" 


vk=’^(ak+rkcos<{,k)  »  °k=rk3intf>k 

where 

r^>0  ,  0<«J>^<tt  (k=l,...,n) 


and  a^,...,an  are  real  constants.  This  family  includes 

products  containing  the  half- lines  \)^=cost.  as  limiting 

2 

cases.  The  Rao  distance  S(l,2)  between  ( ,£^) 

2  2  /  2 
=[(vll’all) »** *’(vln,aln^  and  ^—2 ’—2^ 

2  2  / 

*  V21’a21  ^  ’ ’^V2n,cr2n^  *  wit^  t'ie  identification  of 

of  (4.14)-(4.15) ,  is  given  by 


S(l,2)=/2*j  l  log2 
-k=l 


l+5k(l,2)^ii/2 
i-6k(i,2>  ;■ 


where 


{k(l,2)=|(Ulk~U2k)2+2(°lk~°2k)2  I  U  (k=l . n) . 

^lk'V  +2Colk+02k>  J 


This  distance  reduces  to  that  in  4.2  when  n=l  (see  also  [7] 


for  additional  details) . 
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5.  Hilbert  Space  Embedding. 

The  intrinsic  geometry  of  a  space  of  distributions  may  be 
represented  by  means  of  an  embedding  in  a  Hilbert  space.  In 
order  to  describe  chis  abstract  approach  we  shall  introduce  some 
further  notation. 

For  a  e  1  we  let 

i-Q  =  {p  e  M:  |  |  p|a  ay  <  00 }  (a  ¥  0) , 

=  (p  e  M:  |  (log|p|)2  dp  < 

X 

and  for  0  <  r  <  00  we  also  consider  the  subsets 

La(r)  =  {p  e  La:  j  |p|a  dp  =  r°}  (a  ¥  0) , 

X 

L°(r)  =  {p  e  1°:  j  (log  |p|)2  dp  =  r) . 

X 

2 

In  this  notation,  L  is  a  Hilbert  space  with  the  inner 
product  and  norm 

2 

<p»q)  =  pqdu,  1 1 p 1 1  =  Ap^p)  (p»q  e  L  ) , 

2  x  2 
and  L  (r)  is  the  sphere  of  radius  r  in  L  ,  We  also  define 


L“  =  La  0  M  F*( r)  =  LG(r)  0  M. 

+  +  t 

and  we  write  for  P01  (1) .  Thus,  in  the  notation  of  Section  1, 

L  =  and  P  =  P*. 

+  + 

For  a  e  ]R  and  p  e  M+  we  define 

Ta(p)  =  j||-  pa/2  (a  ¥  0),  TQ(p)  =  log  p  (p  e  M+) . 


(5.1) 


Then  Tq  is  a  bisection  of  M+  onto  M+  for  any  a  4  0  while  Tq  is 

ct  2 

a  bisection  of  M+  onto  M.  Moreover,  Tq  embeds  into  L  with 

(5.2)  T  (1)  =  if  ,  T  (P“>  =  P2(  —  )  (a  *  0) 
a  +  +  a  it 


(5.3)  T0(L°)  =  L2  ,  Tq(P°)  =  P2( 1), 


(5.4) 


The  induced  distance  on  L+  is 
.  i  n  2  | |  a/2  a/2i 

1  =7T  "P1  "  P2  I 

a 


(P1*P2e  L+>  a ^°) 


(5.5)  Po(pl’p2)  =  I  I  lo8  Pi  ~  lo8  P21 1  (pl’p2  E  V* 

Here,  the  distance  =  2||/p^  -  ^p^|  |  is  known  as  the 

"Hellinger  distance"  on  L+  =  We  also  note  that  under  some 

regularity  conditions  on  P^»P2  e  ^  we  have 

p0(pl,p2)  =  lim  Pa(pl’P2)* 
a->0 

Let  0  =  (0^0  ,...)'  be  a  set  of  real  continuous  parameters 

belonging  to  a  parameter  space  0,  a  manifold  embedded  in  some 

oo  2  °°  2 

!Rn,  1  <_  n  <_  oo.  Here  n”  =  i  =  {  (a-  ,a„, . .  .)X:  £  a,  <  oo^ 

1  Z  k=l 

a^  e  3R,  k  =  1,2,...}.  Let  F“  =  {p( .  1 0)  e  L® :  0  e  0},  a  el,  be 
a  parametric  family  of  positive  distributions  p  =  p( . | 0) , 0  e  0, 
having  suitable  regularity  properties.  For  example,  we  assume 


3^  =  3p(  •  1 0 )/30 ±  (p  =  p( .  i  0)  e  Fq) 

is  in  M  for  every  0  e  0  and  each  1=1,2, .  We  also  consider 

the  subfamily  P^  =  F^  H  P°  of  F“ 
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On  Fq  we  have 

P^CpC- 1 9) ,  p( . I  ©  +  50))  *  ds2(0), 

ex  a 

2 

to  the  second  order  infinitesimal  displacements.  Here  dsa  is 
the  a-order  information  metric 

ds2(0)  =  |  p“  (dlogp)2  dp  (a  e  3R), 

X 

where  the  dependence  on  x  e  X  and  0  e  0  in  the  integral  has 

been  supressed.  This  may  also  be  written  as 

ds2(0)  =  d0/I  (0)dB 
a  a 

where 

I  (0)  =  [A\q)) 

a  xj 

is  the  a-order  information  matrix  with 

g^(0)  ”  |  Pa  O^ogpHajlogp)  dp. 

X 

This  matrix  is  always  semi  positive-definite.  It  is  positive- 

definite  at  0  e  0  if  and  only  if  the  functions  {'djp)  are 

2  2 

linearly  independent  over  X*  Note  that  ds  (0)  =  ds^(0)  and 
1(0)  =  1^(0)  are  the  ordinary  information  metric  and  informa¬ 
tion  matrix,  respectively. 

The  geometries  of  and  P0  (a  e  3R)  under  the  a-order 

2  2 
metric  ds  may  be  read  off  from  the  embedding  T  of  LT.  into  L 
a  a  + 

(5.6)  q  =  TQ(p)  (p  g  l”). 

We  then  have 

(5.7)  ds2(p)  =  ds2(q)  =  11  dq||2  (q  e  L2) . 

ct  ^ 

2 

Here  the  parameter  space  0  may  be  taken  as  a  subset  of  L  . 

2 

The  coordinates  of  a  point  q  in  L  may  be  determined  by  any 


orthonormal  basis  (e^,e2,...)  of  L  via  the  Fourier- 


coefficients 


(5.8) 


^  ~  —  1»2, . . .)  . 
2  . 


In  this  way  the  point  q  e  L  is  identified  with  the  point 
2 


(^1*^2*  ***)X  ^  an<*  we  have 


(5.9) 


(q  e  L  ). 


IUII  -  I  £ 

k=l 

2  *  A  2 

When  0  is  L  ,  the  geometry  under  ds2  is  the  usual  eucli¬ 


dean  geometry.  The  Riemann-Christoffel  tensor  of  the  first 


kind  in  identically  zero  and  the  geodesic  curves  q[s]  = 


q(* | s)  e  L  are  the  "straight  lines" 


q[s]  =  as  +  b  (q [s]  e  L  ,  s  e  ]R) 


where  a  and  b  are  parameter-independent  functions  in  L 1 
geodesic  distance  is  then 


The 


2 

S2(ql’q2)  =  p2(ql,q2)  =!lqi-q2H  (qi’q2  e  L 

2. 


When,  on  the  other  hand,  0  is  L  (r)  (0  <  r  <  »),  the  geometry 
2  , 


under  ds2  is  the  spherical  geometry.  In  this  case,  the 
Riemann-Christoffel  tensor  of  the  first  kind  is  given  by 
(5.10) 


R2(x,y:  u , v)  =  (l/4){(x,u)  (y,v)  -  (x,v)  (y,u)} 


where  x,y,u,v  e  L  .  The  mean  Gaussian  curvature  is  then 


R2(x,y:x,y) 


k2<,!'j')  %  ,i2  „  m2  - ,2 


=  1/4  (x,y  s  L  ) . 


Ml  II y II  — I (x,y) ]“ 

To  find  the  geodesic  curves  q=q[s]  (0£s_<L)  of  this 


spherical  geometry,  we  determine  the  solutions  of  the  first 
variation  equation 
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5  j  Il4ts] H  ds  =  0. 

0 

subject  to  the  constraint 

(5.12)  ||q[s]ji=r  (0<s<L) 

Here  s  is  the  arc-length  parameter  and  thus  we  also  have  the 
normalization 


(5.13)  ||q(s]||=l  (0<s<L). 

For  this  purpose,  we  consider  the  Lagrangian 

G(q,q)  =  ||q[s)|j+  X  (s)  [  ||q[s]||  -  r2] 

with  the  Lagrange  multiplier  X (s) .  Using  an  orthonormal  basis 
2 

(e^^,...)  of  L  ,  the  Lagrangian  may  be  represented  with  the 
aid  of  (5.8)-(5.9)  as 


(5.14)  G(q,q)  =  {  J  52>%‘+X(S){£  a2  -  r2} , 

k=l  k=l 

2  2 
where  (^>^2»  *  *  e  ^  t^ie  coordinatization  of  q  e  L  . 

Thus  we  seek  the  extremum  of 

f 

J  G(q,q)  ds 
0 


subject  to  the  constraint  (5.12)  and  the  normalization  (5.13) 
which,  in  view  of  (5. 8) -(5. 9),  may  be  written  as 


(5.15) 


1. 


This  extremum  is  determined  by  the  Euler -Lagrange  equations 


_9G_  _d_  _3G_  _  0 

3%c-ds  8^  “ 

where  G  =  G(q,q)is  given  by  (5.14), 


(k=l ,2,...), 

and  the  conditions  in  (5.15). 
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We  obtain 


(5.16)  2X(s)  -  ‘4  =  0  (k  -  1,2, . 

However,  by  the  first  equation  of  (5.15) 

CO 

l  9k9k  =  °» 

k=i  *kik 

and  so 

CO  CO 

l  &  +  I  ai  -  0, 

k=l  k=l  *k,k 

or,  by  the  second  equation  of  (5.15), 


ii  “  -1 


It  follows  from  (5.16)  and  (5.15)  that  2X(s)r  =  -1  and  that 


4  +  ^  “ 0 

2  2 

This  shows  that  the  geodesic  curves  q  =  q[s]  of  ds^  on  L  (r) 
are  the  "great  circles" 

(5.17)  q[s]  =  a  cos  ^  +  b  sin  ^  (0  _<  s  £  L)  , 

where  a  and  b  are  parameter-independent  orthogonal  functions 
2 

in  L  (r) ,  i.e. 

(5.18)  || a ||  =  ||b||  =  r,  (a,b)  =  0. 

In  order  to  find  the  geodesic  distance  ^(^p^  »  re~ 

2  2 

spect  to  ds  ,  between  q^  and  q ^  of  L  (r) ,  we  use  (5 .17) —(5 .18) 
with  q[0]  =  q^,  q[L]  =  q ^  and  L  =  S2(q^,q2)*  This  gives  the 
spherical  distance 

(5.19)  S2(q1,q2)  =  r  cos_1{^-  (q^q^J  (q1>q2  e  L2(r))  . 

The  arc  on  the  great  circle  in  (5 .17)-(5 .18)  connecting  the  two 
2 

points  q^,q2  e  L  (r)  admits  the  alternative  representation 

(5.20)  {q[s]}2  =  A  COS2(B  -  -J)  (0  £  s  £  L) 


(5.21) 

(5.22) 


A  =  {qx  +  q2  -  2q;Lq2  cos  —  }/  sin  — , 

B  =  tan  ^  {  (—  -  cos  — )  /  sin  —  } 
q^  r  r 


and  L  =  S^q^q^ 


We 


now  describe  the  geometries  of  and  P°  (a  e  1)  under 


the  metric  ds  .  This  is  done,  as  mentioned  previously,  by 
a 

2  2  2 

considering  the  geometries  of  L  and  L  (r)  under  ds2  and  using 

the  embedding  in  (5.1)  with  (5.2)-(5.3)  and  (5.6)-(5.7). 

Here  r  =  r  with  r  =  2/ la  I  for  a  4  0  and  r_  =  1. 
a  a  1  1  0 

The  previous  analysis  shows  that  the  geometry  of  L**  under 

"T 

2 

dsa  is  essentially  euclidean.  Thus  the  Riemann-Christoffel 
tensor  of  the  first  kind  is  identically  zero  and  the  geodesic 
curves  in  have  the  following  description:  For  a=0,  we  have 

p[s]  =  baS  (s  e  H) 

where  a  and  b  are  parameter-independent  functions  in  L+.  For 


a  ^  0,  on  the  other  hand,  we  have 

p[s]  =  (as  +  b) 


2/a 


(0  <  s  <  ®) 


where  a  and  b  are  parameter-independent  functions  in  L+.  The 
geodesic  distance  is  then 

V'W  ’  l,.(pl’p2)  (p1>p2eL“)> 

where  p  is  the  distance  given  in  (5.4)-(5.5). 

2 

The  geometry  of  Pa  under  ds  ,  on  the  other  hand,  is  in- 

2 

duced  by  the  spherical  representation  of  L  (r  ) .  Thus  the 


Riemann-Christoffel  tensor  of  the  first  kind  is 


R  (x,y :  u,v)  =  R,(x,y:  u,v)  (x,y,u,v  e  L  ) , 
where  R2(x,y:  u,v)  is  given  by  (5.10).  It  follows  from  (5.11) 
that  the  mean  Gaussian  curvature  of  ds2  in  is 
Ka(x,y)  =  1/4  (x,y  e  l2). 

Note  that  the  above  quantities  for  a  =  1  give  the  first  in¬ 
formation  curvature  tensor  and  the  information  curvature,  re¬ 


spectively. 

The  geodesic  curves  and  distance  on  P°  with  respect  to 

ds2  are  determined  via  (5. 17) -(5 .22)  with  r  =  r  .  For  a  =  0, 
a  ct 

we  have  rQ  =  1,  and  the  geodesic  curves  p  =  p[s]  e  P  are 
given  by 

p(s)  =  exp{a  cos  s  +  b  sin  s)  (0  s  £  L) 
where  a  and  b  are  parameter-independent  orthonormal  functions 
in  L2(l),  i.e. 

II  all  =  II  b||  =  1,  (a,b)  =  0. 

_0  . 

Similarly,  the  geodesic  distance  on  r  is  then 

s0(p1>p2)  =  cos  1  (lo§  pi>  lo8  p2) 

or 

SQ(p1,p2)  =  cos"1  (log  p1)(log  p2)dy  (p1,p2  e  P  ). 

X 

.Moreover,  corresponding  to  (5 ,20)-(5 . 22)  we  also  have  the  al¬ 
ternative  representation  for  the  geodesic  curve  p  =  p[s]  e  P 
connecting  the  points  p^  and  P2  of  P^,  namely 

p[s]  =  exp{A^  cos  (B— s) }  (0  £  s  £  L) 
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A  =  {(log  p1)2+(log  p2)2-2(logp1)(logp2)  cos  L}  /  sin2L, 
_i.  log  po 

B  =  tan  1  {(-^ - — )  -  cos  L)/sin  L} 

log  Pl' 

and  L  =  SQ(p1,p2). 

For  a  4  0,  on  the  other  hand,  r^  =  2/[o|  and  the  geodesic 
curves  p  =  p[s]  e  P*  are  given  by 

p[s]  =  {a  cos  s  +  b  sin  *4^*  s}2^a  (0£S£L), 
where  a  and  b  are  parameter-independent  orthonormal  functions 
in  P2,  i.e. 

||  a||  =  1|  b |1  =  1,  (a,b)  =  0  (a,b  e  M+)  . 


It  is  also  assumed  that 

M 


a  cos 


s  +  b  sin 


M 

2 


s  e  M+  (0_<  s£L)  . 


In  a  similar  fashion  the  geodesic  distance  on  P*  is 


FT 


cos 


-1  .  a/2  a/2. 


(P 


or 


c  ,  ,2  -1 
s0<pi’p2)  *  t?t  cos 


1  ) 
a/2 


(plP2)a/  dp  ^i»P2  e 


Moreover,  in  correspondence  with  (5 .20)— (5 .22)  the  geodesic 
curve  p  =  p[s  ]  e  P1  connecting  the  p^  and  p2  of  P1  admits  the 
alternative  representation 

p[s]  =  cos2^a  (B  -  -4^-  s)  (0  £  s  £  L) 

where 

A=  {p“  +  p“  -  2(p1p2)ay,2  cos  -4^  U  /  sin2  -44  L, 

B  =  tan  [(— )  ^  -  cos  44"  L]  /  sin  44  L  } 

P1  2  2 

and  L  =  (p  L >P2^ • 
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When  ot  =  1,  we  find  that  the  Rao  distance  on  P  =  is 

S(P1>P2)  =  S1(P1»P2^  =  2  cos_1  |  (pip2^  dP 

X 

which  is  effictively  the  Hellinger-Bhattacharyya  distance, 
described  in  4.3(3°).  This  distance  was  obtained  previously 
in  Rao  [24]  by  using  rather  concrete  and  explicit  methods,  and 
later  in  Dawid  [12]  by  using  abstract  methods. 


53 


REFERENCES 


[1]  Amari,  S.,  Theory  of  information  spaces-a  geometrical 
foundation  of  the  analysis  of  communication  systems, 
RAAG  Memoirs  4  (1968),  373-418. 

[2]  Amari,  S.,  Theory  of  information  space:  a  differential- 
geometrical  foundation  of  statistics,  RAAG  Reports  106 
(1980),  1-53. 

[3]  Atkinson,  C.  and  Mitchell,  A.F.S.,  Rao's  distance 
measure,  Sankhya  43  (1981),  345-365. 

[4]  Bamdorff-Nielsen,  0.,  Information  and  Exponential 
Families  in  Statistical  Theory,  Wiley,  New  York,  1978. 

[5]  Bhattacharyya,  A.,  On  a  measure  of  divergence  between 
two  statistical  populations,  Bull.  Calcutta  Math.  Soc. 
35  (1943),  99-109. 

[6]  Burbea,  J.,  J-divergences  and  related  concepts, 

Encycl.  Statist.  Sci.  4  (1983),  290-296  (ed.  Kotz- 
Johnson;,  J.  Wiley,  New  York,  1983. 

[7]  Burbea,  J.  and  Rao,  C.R.,  Entropy  differential  metric, 
distance  and  divergence  measures  in  probability  spaces: 
a  unified  approach,  J.  Multivariate  Anal.  12  (1982), 
575-596. 

[8]  Burbea,  J.  and  Rao,  C.R. ,  Differential  metrics  in  pro¬ 
bability  spaces.  Probability  Math.  Statist,  3  (1982), 
115-132. 

[9]  Cencov,  N.N, ,  Categories  of  mathematical  statistics 
(in  Russian),  Doklady  Akad.  Nauk,  SSSR  164  (1965),  3. 

[10]  Cencov,  N.N.,  Statistical  Decision  Rules  and  Optimal 
Conclusions  (in  Russian),  Nauka,  Moskva,,"’l9T2. 

[11]  Dawid,  A.P.,  Discussion  on  Professor  Efron's  paper 
(1975),  Ann.  Statist.  3  (1975),  1231-1234. 

[12]  Dawid,  A.P.,  Further  comments  on  some  comments  on  a 
paper  by  Bradley  Efron,  Ann.  Statist.  5  (1977),  1249. 

[13]  Efron,  B.,  Defining  the  curvature  of  a  statistical  pro¬ 
blem  (with  applications  to  second  order  deficiency) , 
(with  discussion),  Ann.  Statist.  3  (1975),  1189-1217. 


Efron,  B.,  The  geometry  of  exponential  families,  Ann. 
Statist.  6  (1978),  362-376. 


Eisenhart,  L. ,  Riemannian  Geometry,  Princeton  Univ. 
Press,  Princeton,  1526  and  1960. 

Eisenhart,  L. ,  An  Introduction  to  Differential  Geo¬ 
metry,  Princeton  Univ.  Press,  Princeton,  1940  and  1364. 

Fisher,  R.A. ,  Theory  of  statistical  estimation,  Proc. 
Camb.  Phil.  Soc.  22  (1925),  700-725. 

Hicks,  N.J.,  Notes  on  Differential  Geometry,  Van 
Nostrand,  Princeton,  1965. 

Kobayashi,  S.  and  Nomizu,  K. ,  Foundations  of  Differen¬ 
tial  Geometry,  Vol.  II,  Wiley,  New  York,  1968. 

Laugwitz,  D. ,  Differential  and  Riemannian  Geometry, 
Academic  Press,  New  York,  1965. 

Mahalanobis,  P.C.,  On  the  generalized  distance  in 
statistics,  Proc.  Nat.  Inst.  Sci.  India  12  (1936),  49- 
55. 

Oiler,  J.M.  and  Cuadras,  C.M,,  Rao's  distance  for 
negative  multinomial  distributions,  Sankhya  (in  press). 

Pitman,  E.J.G.,  Some  Basic  Theory  for  Statistical 
Inference,  Hals ted  Press,  New  York,  1979. 

Rao,  C.R.,  Information  and  accuracy  attainable  in  the 
estimation  of  statistical  parameters,  Bull.  Calcutta 
Math.  Soc.  37  (1945),  81-91. 

Rao,  C.R.,  On  the  distance  between  two  populations, 
Sankhya  9  (1949),  246-248. 

Rao,  C.R. ,  Efficient  estimates  and  optimun  inference 
procedures  in  large  samples,  (with  discussion),  J.  Roy, 
Statist.  Soc,  B.  24  (1962),  46-72. 

Rao,  C.R. ,  Linear  Statistical  Inference  and  its  Appli- 


.  Pri 
123> 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  THIS  PACE 


It  REPORT  SECURITY  CLASSIFICATION 

UNCLASSIFIED 


2t.  security  classification  authority 


REPORT  DOCUMENTATION  PAGE 


lb.  RESTRICTIVE  MARKINGS 


2b.  OECLASSIFICATION/DOWNGRAOING  SCHEDULE 


3.  DISTRIBUTION/AVAILABILITY  OF  REPORT 

Approved  for  public  release;  distribution 
unlimited . 


4.  PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 

84-52  - 


5.  MONITORING  ORGANIZATION  REPORT  NUMBERIS) 

AFOSR-TR-  85-0  0  15 


6s.  NAME  OF  PERFORMING  ORGANIZATION 

University  of  Pittsburgh 


b.  or:  ICE  SYMBOL  7«.  NAME  OF  MONITORING  ORGANIZATION 
(If  applicable I 

Air  Force  Office  of  Scientific  Research 


6c.  AODRESS  (City.  State  andZIP  Code) 

Center  for  Multivariate  Analysis 

515  Thackeray  Hall,  Pittsburgh  PA  15260 


7b.  ADDRESS  (City.  State  and  ZIP  Code) 

Directorate  of  Mathematical  &  Information 
Sciences,  Bolling  AFB  DC  20332-6448 


rl 


U.  NAME  OF  FUNDING/SPONSORING 
ORGANIZATION 

AFOSR 


Sc.  ADDRESS  (City.  State  and  ZIP  Code) 


8b.  OFFICE  SYMBOL 
Of  applicable) 


9.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 

F49620-85-C-0008 


10.  SOURCE  OF  FUNDING  NOS. 


PROGRAM 

PROJECT 

TASK 

WORK  UNIT 

ELEMENT  NO. 

NO. 

NO. 

NO. 

Bolling  AFB  DC  20332-6448 

61102F 

2304 

A5 

11.  TITLE  (Include  Security  Classification) 

INFORMATIVE  GEOMETRY  OF  PROBABILITY  SPACES 


12.  PERSONAL  AUTHOR(S) 

Jacob  Burbea 


13*.  TYPE  OF  REPORT 

Technical 


16.  supplementary  notation 


13b.  TIME  COVERED 

14.  DATE  OF  REPORT  (Yr..  Mo..  Day) 

FROM  TO 

DEC  84 

15.  PAGE  COUNT 

55 


17. 


FIELD 


COSAT  I  COOES 


18.  SUBJECT  TERMS  (Continue  on  reuerae  if  neceuary  and  identify  by  block  number) 


19.  ABSTRACT  (Continue  on  reverae  if  neceaaary  and  identify  by  block  number) 

The  paper  is  concerned  with  the  geometrical  properties  that  are  induced  by  the  local 
information  contents  and  structures  of  the  parameter  space  of  probability  distributions. 

Of  particular  interest  in  this  investigation  is  the  Rao  distance  which  is  the  geodesic 
distance  induced  by  the  differential  metric  associated  with  the  Fisher  information  matrix 
of  the  parameter  space.  Moreover,  following  Efron,  Dawid  and  Amari,  some  affine  connections 
are  introduced  into  the  informative  geometry  of  parameter  space  and  thereby  elucidating  the 
role  of  the  curvature  in  statistical  studies.  In  addition,  closed  form  expressions  of  the 
Rao  distances  for  certain  families  of  probability  distributions  are  given  and  discussed. 


20.  OlSTRIBUTION/AVAILABILITY  OF  ABSTRACT  21.  ABSTRACT  SECU 

UNCLASSIFIED/UNLIMITED  B  SAME  AS  RPT.  □  OTIC  USERS  □  UNCLASSIFIED 


22a.  NAME  OF  RESPONSIBLE  INDIVIDUAL 

MAJ  Brian  W.  Woodruff 


21.  ABSTRACT  SECURITY  CLASSIFICATION 


22b.  TELEPHONE  NUMBER 
(Include  Area  Code) 


(202)  767-  5027 


DD  FORM  1473, 83  APR 


EDITION  OF  1  JAN  73  IS  OBSOLETE. 


22c.  OFFICE  SYMBOL 

NM 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  THIS  PAGE 


